SL3 Neural Network

1 minute read

Published: January 16, 2021

Neural Network

1. Perceptron concepts

1. Concepts:

transfer function, activation function, weight, threshold
AND(w1=1, w2=1,theta>1), OR(w1=1, w2=1, theta< 1), XOR(w1=1, w2=-2, s3=1, theta= 1), NOT(w1=-1, theta=0)
A perceptron will be a hyper-plane in n dimension

2. How to find perceptron weights?:

Perceptron Rule
1. threshold output values Y_hat, must used on linearly-separable data
2. delta w = (y-y_hat) * xi — +1, -1, or 0
3. learning rate: multiplied by lr
Gradient Descent
1. unthreshold activation function value a, it’s ok if it’s not linearly separable
2. What is gradient? the generalization of derivatives in several variables— direction of fastest increase of the function
3. error function to be minimized: E(w)=1/2 * sum(y-a)^2 — a=Sum_XiWi

3. Sigmoid function:

Why: to makes the mapping from input to threshold output differentiable (smooth out the plot of the function)
What: sig(a) = 1/1-e^(-a)

2. ANN & Describe Back Propagation

1.ANN

Perceptrons is a single-layer neural network
The underlying principles behind ANN is divide and recombine, using weights

2.Backpropagation:

Weights are randomly initiated
The output of each node was calculated sig(sum(wixi))
Calculate the error between the true final output and final output
We pass back the error layer by layer, to adjust the weights using calculated errors

3. Bias of ANN

1. Restriction Bias: representation power of data structure + set of hypotheses we consider.

Linear data for perceptron
Boolean: network of threshold-like units
Continuous: single layer with enough hidden nodes
Arbitrary: multiple hidden layers

Note: Not only boolean functions, it can also express continuous function with only 1 hidden layer, as long as it has enough neurons. It can also express arbitrary function by using 2 hidden layers, showing the continuous first, then adding the jumps at the seams between patches.

2. Preference Bias: given two presentations, why we prefer one to another.

Choose initial weights to be small random values
Random: provide variability to avoid local minima
Small: avoid overfitting, reduce complexity
Prefer simpler and generalizable representations --> occam's razor

4. How to avoid overfitting?

Restrict to a bounded number of layers, nodes, and small values of weights
Use cross valuation to decide the parameters

Share on

Twitter Facebook LinkedIn

Yingjun Mou